Fix streaming tool call merge for Qwen and OpenAI-compatible APIs #4794

ultramancode · 2025-11-03T16:24:13Z

Problem

When using OpenAI-compatible APIs like Qwen with streaming tool calls, subsequent chunks may not include the tool call ID. The current MessageAggregator uses addAll()which creates separate, incomplete ToolCall objects for each chunk instead of merging them. This results in ToolCall objects with empty name fields, causing:
IllegalArgumentException: toolName cannot be null or empty

Root Cause

Some OpenAI-compatible APIs (e.g., Qwen via OpenRouter) follow a streaming pattern where:

First chunk: Contains both id and function.name
Subsequent chunks: Contain only function.arguments without id

Example:

Chunk 1: ToolCall(id="tool-123", name="getCurrentWeather", args="")
Chunk 2: ToolCall(id="",        name="",                  args="{\"location\": \"")
Chunk 3: ToolCall(id="",        name="",                  args="Seoul\"}")

Solution

Added mergeToolCalls() method in MessageAggregator as a safety net to handle tool call fragments that may not be properly merged at the API layer (e.g., OpenAiStreamFunctionCallingHelper).

This ensures that even when API-layer merging is incomplete or providers behave slightly differently, the aggregation layer can properly merge streaming tool call fragments.

This handles:

Standard ID-based matching (existing behavior)
ID-less streaming chunks
Multiple simultaneous tool calls
Mixed ID/no-ID scenarios

Changes

Replaced addAll() with new mergeToolCalls() method to properly handle streaming tool call fragments
Added mergeToolCall() helper method for null-safe property merging
Added comprehensive tests in MessageAggregatorTests
- shouldMergeToolCallsWithoutIds: Verifies Qwen streaming pattern
- shouldMergeMultipleToolCallsWithMixedIds: Multiple tool calls
- shouldMergeToolCallsById: ID-based matching still works

Testing

All tests pass with actual Qwen streaming response pattern verified via OpenRouter API.

Example:

// Input: Streaming chunks
Chunk 1: ToolCall(id="tool-123", name="getCurrentWeather", args="")
Chunk 2: ToolCall(id="",        name="",                  args="{\"location\": \"")
Chunk 3: ToolCall(id="",        name="",                  args="Seoul\"}")

// Output: Merged result
ToolCall(id="tool-123", name="getCurrentWeather", args="{\"location\": \"Seoul\"}")

- Update MessageAggregator to handle tool calls without IDs - When tool call has no ID, merge with last tool call - Add comprehensive tests for streaming patterns Signed-off-by: ultramancode <[email protected]>

Signed-off-by: ultramancode <[email protected]>

ilayaperumalg · 2025-11-07T16:14:12Z

@ultramancode Thanks for the PR!

A few questions to understand the fix:

Do you handle the tool calling at your application level instead of Spring AI?
The MessageAggregator handles the Flux of ChatRespones at the later lifecycle where the AssistantMessage wouldn't have the tool calls. I am trying to understand why you would receive chunked assistant messages with tool calls at the MessageAggregator level.
Given you successfully manage to repeat this behaviour, Is it possible to have an integration test to repeat this issue? or, a specific model I can try to repeat this behavior?

Thanks

ultramancode · 2025-11-11T16:59:45Z

@ilayaperumalg Thanks for the detailed questions!

No, I'm using standard Spring AI flow. The issue occurs within Spring AI's internal streaming processing.
You're absolutely right to question this! In normal flow, MessageAggregator shouldn't receive incomplete tool calls. However, in the case described below, windowing fails and incomplete chunks leak through.
This bug is environment-dependent, making it difficult to reproduce reliably with live API calls. I've created unit tests based on the response pattern (MessageAggregatorTests). If you'd like to verify the bug scenario directly, simulating the streaming chunks from the issue log would be the most reliable approach.

Evidence from issue #4790

Chunk 1: `id="call_f7e76b4bdf8242b68b7124"`, `name="init_work_status"`, `arguments=""`
Chunk 2: `id="call_f7e76b4bdf8242b68b7124"`, `name=""` ← Empty!, `arguments="{\\"firstStep"`
Chunk 3: `id=""` ← Empty!, `name=""` ← Empty!, `arguments="\\": \\"开始需求"`

The symptom: These incomplete chunks reached MessageAggregator as separate ChatResponses, causing:

java.lang.IllegalArgumentException: toolName cannot be null or empty
at DelegatingToolCallbackResolver.resolve
at MessageAggregator.aggregate(MessageAggregator.java:91)

This proves windowing failed. If OpenAiApi.windowUntil() worked correctly, all chunks would have been merged before reaching MessageAggregator.

Why windowing fails with Qwen

The windowUntil() logic relies on detecting tool call completion:

return choice.finishReason() == ChatCompletionFinishReason.TOOL_CALLS;

Qwen's streaming either doesn't send the correct finishReason, or has other incompatibilities preventing proper window detection.

Related ecosystem issue:

Qwen3-Coder #180 - Reports finishReason = "stop" instead of "tool_calls", function calling requires non-standard workarounds (e.g., specific system prompts), behavior inconsistent across deployments

Interestingly, Qwen's official spec documents finishReason = "tool_calls" as standard, but the implementation (especially via vLLM/OpenRouter) doesn't match the specification.

So windowing fails → incomplete chunks leak through → MessageAggregator receives them.

Why fix in MessageAggregator instead of OpenAiApi windowing?

Windowing approach (problematic):

Must distinguish "Qwen's buggy stop" from "legitimate stop"
Requires heuristics that could have false positives
Risk breaking existing OpenAI behavior
OpenAI-only - doesn't help other models

MessageAggregator approach (safe):

No distinction needed - just defensively merges any ToolCall chunks
Idempotent - already-merged ToolCalls pass through unchanged
No false positives - normal single ToolCall triggers no merging
Universal - protects all models that use MessageAggregator

Example scenarios:

Normal OpenAI (windowing works):

Chunks merge correctly → 1 complete ToolCall → MessageAggregator sees it → passes through

Buggy Qwen (windowing fails):

Chunks don't merge → 3 incomplete ToolCalls → MessageAggregator merges them → 1 complete ToolCall

-> MessageAggregator acts as a safety net without needing to know if upstream processing succeeded or failed.

A real-world concern: Integrating with existing LLM infrastructure

This isn't about supporting "broken" APIs—it's about bridging the gap between specification and implementation in real-world enterprise environments.

A scenario I'm concerned about (based on my experience):

I recently worked on an AI agent solution for a client where:

Existing systems were already running on their deployed Qwen model
The LLM infrastructure was managed by a separate team (not under our control)
We had to integrate with the existing model, not replace it

The constraint:

Cannot replace the model (other systems depend on it)
Cannot modify vLLM/serving layer (managed by infrastructure team)
Cannot control model deployment decisions (made at organizational level)
Can only control the integration layer

While I didn't use Spring AI for that project, I'm concerned that Spring AI users will face the same situation: needing to integrate with pre-existing LLM infrastructure that has implementation gaps, with no ability to fix upstream components.

Thanks!

ultramancode mentioned this pull request Nov 3, 2025

[Bug] Streaming tool_calls with Qwen models cause toolName cannot be null or empty in Spring AI 1.0.3 #4790

Open

Fix tool call merging for streaming APIs without IDs

d2057f3

- Update MessageAggregator to handle tool calls without IDs - When tool call has no ID, merge with last tool call - Add comprehensive tests for streaming patterns Signed-off-by: ultramancode <[email protected]>

ultramancode force-pushed the fix/streaming-tool-calls-merge branch from bf2c9ce to d2057f3 Compare November 4, 2025 06:18

Format: apply Spring Java Format

77801ed

Signed-off-by: ultramancode <[email protected]>

ilayaperumalg added tool/function calling for: backport-to-1.0.x labels Nov 5, 2025

ilayaperumalg assigned tzolov, markpollack and ilayaperumalg Nov 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix streaming tool call merge for Qwen and OpenAI-compatible APIs #4794

Fix streaming tool call merge for Qwen and OpenAI-compatible APIs #4794

ultramancode commented Nov 3, 2025 •

edited

Loading

Uh oh!

ilayaperumalg commented Nov 7, 2025

Uh oh!

ultramancode commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix streaming tool call merge for Qwen and OpenAI-compatible APIs #4794

Are you sure you want to change the base?

Fix streaming tool call merge for Qwen and OpenAI-compatible APIs #4794

Conversation

ultramancode commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Changes

Testing

Uh oh!

ilayaperumalg commented Nov 7, 2025

Uh oh!

ultramancode commented Nov 11, 2025

Why windowing fails with Qwen

Related ecosystem issue:

A real-world concern: Integrating with existing LLM infrastructure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ultramancode commented Nov 3, 2025 •

edited

Loading